基于Fisher判别字典学习的说话人识别

王伟; 韩纪庆; 郑铁然; 郑贵滨; 陶耀

doi:10.11999/JEIT 150566

基于Fisher判别字典学习的说话人识别

doi: 10.11999/JEIT 150566 cstr: 32379.14.JEIT 150566

基金项目:

国家自然科学基金(61071181, 61471145)，国家自然科学基金重大研究计划 (91120303)

计量
- 文章访问数: 1313
- HTML全文浏览量: 178
- PDF下载量: 777
- 被引次数: 0
出版历程
- 收稿日期: 2015-05-13
- 修回日期: 2015-09-06
- 刊出日期: 2016-02-19

Speaker Recognition Based on Fisher Discrimination Dictionary Learning

Funds:

The National Natural Science Foundation of China (61071181, 61471145), The Major Research Plan of the National Natural Science Foundation of China (91120303)

摘要

摘要: 稀疏表示已成功应用于说话人识别领域。在稀疏表示中，构造好的字典起着重要的作用。该文将Fisher准则的结构化字典学习方法引入说话人识别系统。在判别字典的学习过程中，每一个字典对应一个类标签，因此同类别训练样本的重构误差较小。同时，保证训练样本的稀疏编码系数类内误差最小，类间误差最大。在NIST SRE 2003数据库上，实验结果表明该算法得到的等错误率是7.62%，基于余弦距离打分的i-vector的等错误率是6.7%。当两个系统融合后，得到的等错误率是5.07%。
- 说话人识别 /
- 字典学习 /
- 稀疏表示 /
- Fisher判别
Abstract: Motivated by the success of sparse representation in speaker recognition,?a good?dictionary?plays an important role in?sparse representation. In this paper, the structured dictionary learning is introduced to speaker recognition based on the Fisher criterion. In the process of learning the discrimination dictionary, each sub-dictionary of the learned dictionary corresponds to a class label, so the reconstruction error of the same training samples is small. Meanwhile, the sparse coding coefficients have small with-class scatter and big between-class scatter. On the NIST SRE 2003 database, the experimental results indicate that the proposed method achieves an Equal Error Rate (EER) of 7.62%, and the i-vector system based on cosine distance scoring gives an EER of 6.7%. Moreover, an EER of 5.07% is obtained by combining two systems.
- Speaker recognition /
- Dictionary learning /
- Sparse representation /
- Fisher Discrimination (FD)

HTML全文

参考文献(34)

CANDS E. Compressive sampling[C]. Proceedings of the 2nd International Congress of Mathematicians, Spain, 2006: 1433-1452.

CANDS E J, ROMBERG J, and TAO T. Robust uncertainty principles: Exact signal reconstruction from highly incomplete frequency information[J]. IEEE Transactions on Information Theory, 2004, 52(2): 489-509.

BARANIUK R. Compressive sensing[J]. IEEE Signal Processing Magazine, 2008, 56(4): 4-5.

丁军, 刘宏伟, 王英华. 基于非负稀疏表示的SAR图像目标识别方法[J]. 电子与信息学报, 2014, 36(9): 2194-2200. doi: 10.3724/SP.J.1146.2013.01451.

DING Jun, LIU Hongwei, and WANG Yinghua. SAR image target recognition based on non-negative sparse representation[J]. Journal of Electronics Information Technology, 2004, 36(9): 2194-2200. doi: 10.3724/SP.J.1146. 2013.01451.

苏伍各, 王宏强, 邓彬, 等. 基于稀疏贝叶斯方法的脉间捷变频ISAR成像技术研究[J]. 电子与信息学报，2015, 37(1): 1-8. doi: 10.11999/JEIT.140315.

SU Wuge, WANG Hongqiang, DENG Bin, et al. The interpulse frequency agility ISAR imaging technology based on sparse bayesian method[J]. Journal of Electronics Information Technology, 2015, 37(1): 1-8. doi: 10.11999/ JEIT.140315.

HUANG K and AVIYENTE S. Sparse Representation for Signal Classification[M]. New York, MIT Press, 2006: 609-616.

MALLAT S. A Wavelet Tour of Signal Processing[M]. Second Edition. New York, Academic Press, 1999: 506-513.

CANDS E J and GUO F. New multiscale transforms, minimum total variation synthesis: Applications to edge-preserving image reconstruction[J]. Signal Processing, 2002, 82(2): 1519-1543.

GABOR D. Theory of communication. Part 1: the analysis of information[J]. Journal of the Institution of Electrical Engineers-Part III: Radio and Communication Engineering, 1946, 93(26): 429-441.

AHARON M, ELAD M, and BRUCKSTEIN A. The K-SVD: An algorithm for designing overcomplete dictionaries for sparse representation[J]. IEEE Transactions on Signal Processing, 2006, 54(11): 4311-4322.

MAIRAL J, BACH F, and PONCE J. Online dictionary learning for sparse coding[C]. Proceedings of the 26th Annual International Conference on Machine Learning, Canada, 2009: 689-696.

WANG J, LU C, WANG M, et al. Robust face recognition via adaptive sparse representation[J]. IEEE Transactions on Cybernetics, 2014, 44(12): 2368-2378.

KUA J M K, AMBIKAIRAJAH E, and EPPS J. Speaker verification using sparse representation classification[C]. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Czech Republic, 2011: 4548-4551.

LI M, ZHANG X, and YAN Y. Speaker verification using sparse representations on total variability i-vectors[C]. 12th Annual Conference of the International Speech Communication Association (Interspeech), Italy, 2011: 2729-2732.

MAIRAL J, BACH F, and PONCE J. Discriminative learned dictionaries for local image analysis[C]. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Anchorage, 2008: 1-8.

ZHANG Q and LI B. Discriminative K-SVD for dictionary learning in face recognition[C]. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), San Francisco, 2010: 2691-2698.

RAMIREZ I, SPRECHMANN P, and SAPIRO G. Classification and clustering via dictionary learning with structured incoherence and shared features[C]. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), San Francisco, 2010: 3501-3508.

JIANG Z, LIN Z, and DAVIS L S. Label consistent K-SVD: learning a discriminative dictionary for recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2013, 35(11): 2651-2664.

MAIRAL J, PONCE J, and SAPIRO G. Supervised Dictionary Learning[M]. New York, MIT Press, 2009: 1033-1040.

WANG Z, YANG J, NASRABADI N, et al. Look into sparse representation based classification: A margin-based perspective[C]. IEEE International Conference on Computer Vision (ICCV), Sydney, 2013: 759-769.

YANG M, ZHANG L, FENG X C, et al. Sparse representation based fisher discrimination dictionary learning for image classification[J]. International Journal of Computer Vision, 2014, 109(3): 209-232.

RAO W and MAK M W. Boosting the performance of i-vector based speaker verification via utterance partitioning [J]. IEEE Transactions on Audio, Speech, and Language Processing, 2013, 21(5): 1012-1022.

LIU T T, KANG Kai, and GUAN S X. I-vector based text-independent speaker identification[C]. 11th World Congress on Intelligent Control and Automation (WCICA), Shenyang, 2014: 5420-5425.

DEHAK N, KENNY P, and DEHAK R. Front-end factor analysis for speaker verification[J]. IEEE Transactions on Audio, Speech, and Language Processing, 2011, 19 (4): 788-798.

DEHAK N, KENNY P, and DEHAK R. Support vector machines and joint factor analysis for speaker verification[C]. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Taiwan, 2009: 4237-4240.

ROSASCO L, VERRI A, and SANTORO M. Iterative projection methods for structured sparsity regularization[R]. MIT Technical Reports, MIT-CSAIL-TR-2009-050, CBCL-282, 2009.

GU S, ZHANG L, and ZUO W. Projective Dictionary Pair Learning for Pattern Classification[M]. New York, MIT Press, 2014: 793-801.

KENNY P, STAFYLAKIS T, and OUELLET P. PLDA for speaker verification with utterances of arbitrary duration[C]. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Vancouver, 2013: 7649-7653.

HARIS B C and SINHA R. Sparse representation over learned and discriminatively learned dictionaries for speaker verification[C]. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Kyoto, 2012: 4785-4788.

STAFYLAKIS T, KENNY P, and SENOUSSAOUI M. PLDA using gaussian restricted boltzmann machines with application to speaker verification[C]. 13th Annual Conference of the International Speech Communication Association (Interspeech), Portland, 2012: 1692-1695.

KINNUNEN T and LI H. An overview of text-independent speaker recognition: from features to supervectors[J]. Speech Communication, 2010, 52(1): 12-40.

KANAGASUNDARAM A, DEAN D, SRIDHARAN S, et al. I-vector based speaker recognition using advanced channel compensation techniques[J]. Computer Speech Language, 2014, 28(1): 121-140.

施引文献

资源附件(0)

访问统计